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ABSTRACT 

We assess the detectability of a nanohertz gravitational wave (GW) background 
with respect to additive white noise and especially red noise in the timing of millisec- 
ond pulsars. We develop detection criteria based on the shape and amplitude of the 
cross-correlation function summed over pulsar pairs in a pulsar timing array. The distri- 
bution of correlation amplitudes is found to be non-Gaussian and highly skewed, which 
significantly influences the detection and false-alarm probabilities. When only white 
noise combines with GWs in timing data, our detection results are consistent with those 
found by others. Red noise, however, drastically alters the results. We discuss methods 
to meet the challenge of GW detection ("climbing mount significance") by distinguish- 
ing between GW-dominated and red or white-noise limited regimes. We characterize 
plausible detection regimes by evaluating the number of millisecond pulsars that must 
be monitored in a high-cadence, 5-year timing program for a GW background spectrum 
h c (f) = Af~ 2 / 3 with A = 10 -15 yr~ 2 / 3 . Our results suggest that unless a sample of 20 
super-stable millisecond pulsars can be found — those with timing residuals from red- 
noise contributions a r < 20 ns — a much larger timing program on > 50 — 100 MSPs 
will be needed. For other values of A, the constraint is a r < 20 ns (A/10" 15 yr~ 2 / 3 ). 
Identification of suitable MSPs itself requires an aggressive survey campaign followed 
by characterization of the level of spin noise in the timing residuals of each object. 
The search and timing programs will likely require substantial fractions of time on new 
array telescopes in the southern hemisphere as well as on existing ones. 



1. Introduction 

There is current strong interest in exploiting the spin stability of millisecond pulsars (MSPs) to de- 
tect gravitational waves (GWs) at nanohertz frequencies (~ 0.1 — 1 cy yr _1 ). Sources of GWs in this 
band include mergers of supermassive black holes that collectively produce an isotropic background 



- 2 - 



(jJaffe k Backed 12003 : IPhinney 
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Jenet et al 



may be detected in dividually (jLommen &; Backer 



2005 ; ISesana et al.1 I2008T) and in a few cases 



2001 



Jenet et al 



2004; 



Finn fc Lommen 



2010; 



Yar dley et al.ll2010l ). Timing may also de tect GW backg rounds from cosmic strings (jOlmez et al 



2010h . the influence of massive gravitons (|Lee et al.ll2010l ) or solar system perturbations from pri- 



mordial black holes (ISeto fc Coorayl 120071 ) . Detection methods have been based on finding excess 
variance in the timing residuals of individual sources, investigating spectral signatures in power 
spectra, or identifying the angular correlation expected between pulsar pairs from GWs passing 
through the solar system. Astrophysical and instrumental processes limit the timing precision of 
any given pulsar and the overall sensitivity of a pulsar timing array (PTA). 

However, the efficacy of detection methods has received very uneven assessment with respect to 
contamination from different kinds of additive noise. These include both white and red noise 
processes, the latter having power strongly concentrated at lower fluctuation frequencies. We 
evaluate their impact on the sensitivity to a stochastic GW background, which itself comprises a 
red noise process. 

In this paper, we are concerned with detection of GWs as distin guished from their detailed char 
acterization. Recent work has consi dered both frequentist (e.g . iJenet et al.l 120051 ; lYardlev et al 



2011 



and Bayesian approaches (e.g. Ivan Haasteren et al.l |2009|, 



2011 



to the detection problem. 
While these methods are robust to varying degrees, our view is that detection needs to be cor- 
roborated with convincing diagnostics. We draw an analogy with the detection of a new spectral 
line or detection of cosmic microwave background fluctuations. Bayesian inference can yield the 
best probabilistic constraints on signal parameters, but most observers are convinced of an un- 
derlying detection and characterization only with the display of a spectral line or the power vs. 
spherical harmonic number that indicates a significant signal with respect to measurement errors. 
The corresponding quantity in the pulsar-GW problem is the cross correlation between the timing 
residuals of pulsar pairs or some related quantity. 

We derive a general expression for the signal-to-noise ratio (SNR) of a correlation-based detection 
statistic and develop a detection protocol based on the shape and amplitude of the cross-correlation 
function. We assess the challenges for a likely detection and estimate the minimum number of MSPs 
needed under different circumstances. To do so, we consider a hypothetical pulsar distribution 
that yields the highest possible SNR for the correlation function, all else being equal. This is a 
configuration where N p pulsars are in the same direction but at different distances so that the 
perturbation from GWs passing through the solar system is 100% correlated between all objects. 
Any other configuration will yield smaller correlation and SNR. 

The correlated effect on ti mes of arrival (TOAs) f or different pulsars is produced by GWs passing 
through the solar system (jHellings fc Downslll983l ). We designate the Hellings and Downs angular 
correlation function as £(0) for two objects separated by an angle 6, with normalization ("(0) = 1; 
this differs from other definitions in the literature that include a delta function associated with 
GWs at the pulsar location, which yields C(0 + ) = 1/2. For our purposes, we keep the pulsar term 
separate. 

We assess de tectability in terms o f diffe rent levels of white and red noise in timing residuals. Our 
work follows ICordes fc Shannon! (120101 ) where we assess a wide range of contributions to TOA 
errors from the pulsar, the interstellar medium, and from instrumental effects. White noise not 
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only includes radiometer noise but also pulse-to-pulse phase jitter from magnetospheric activity 
and from an effect associated with interstellar scintillation, which are pulsar and line-of-sight (LOS) 
dependent, respectively. We have also shown that red spin noise, which is common in canonical 
pulsars — those with periods of order one second and surface magnetic fields ~ 10 12 G — is also 
to be expected in MSPs but at low levels in accord with their spin parameters ((Shannon &: Cordes 



2010 ) . Interstellar scintillation also contributes red-noise TOA perturbations, some of which can 



be corrected before any analysis for GWs. 

In the next section we describe the cross correlation analysis of a simplified timing model and 
develop detection criteria for assessing the presence of a GW signal. In § [3] we describe how 
the prospects for detection can be maximized. We summarize our results and discuss them in 
broader terms in § HI The Appendix defines quantities used to characterize the timing residuals 
and describes simulations. 



2. Cross Correlation Detection 

We use a simplified model for the timing residuals of a pulsar by excluding real -world effects such 



as tim e transfer and the error in the location of the solar-system barycenter (e.g. iBacker &; Hellings 



1986 ). 

x(t) = e(t) + u(t), (1) 

where e(t) is the "Earth" part of the gravitational wave (GW) background and u(t) includes all 
other processes, which we assume are uncorrelated between different pulsars. At minimum, u(t) 
includes the GW perturbation p(t) at the pulsar's location that acted on the measured signal a 
time D/c earlier, where D is the pulsar's distance. Later we expand u(t) into three components 
that are uncorrelated with e(t) and with each other, 

u(t)=p(t) + r{t)+n(t), (2) 

where r(t) is red noise associated with spin noise ("timing noise") in the pulsar or with multipath 
propagation in the interstellar medium (ISM), and n(t) is white noise that represents measurement 
errors of different kinds. Each is characterized by a correlation function a x (T)p x (t,t'), where cr x (T) 
is the ensemble-mean variance over an interval [0, T] and p x (t,t') is the normalized correlation 
function defined with two arguments to handle non-stationary as well as stationary processes. 

Figure [T] (left panel) shows simulated time series for a 20-MSP PTA. The time series show the e 
terms to be identical for all 20 pulsars, as assumed, while the p and r terms are different. The sum 
of all terms and the residuals from a second-order polynomial fit are also accordingly different. 
The bottom row of the panel shows sums of the various terms in the first five columns, which 
demonstrate the reduction in rms by l/\/20 for the p and r terms. In the right-hand panel CCFs 
for 20 realizations of the PTA are shown where the GW, red-noise and white-noise contributions 
are equal over the data span T, corresponding to ip = 1 and a e = a p = a r = a n in one set of curves 
(the left column), while the GWs are turned off for the curves in the right-hand column. 

The model in Eq. CE])-© is a sum of Gaussian distributed processes because all terms originate 
from conditions that usually will satisfy the central limit theorem (CLT). Focusing events in the 
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Time 

Fig. 1. — (Left) Simulated time series for a 20-pulsar timing array over 5 years. The columns 
from left to right show the e,p and r terms in the timing model, their sum added to white noise 
(n), and the residuals A(e+p + r + n) from a second-order polynomial fit. The simulations include 
GWs, red noise and white noise that have identical rms values a gw = cr r = a n = 20 ns. in the 
post-fit time series. The bottom row shows the average of each column over the 20 pulsars. (Right) 
CCFs for a 20-pulsar PTA after removing a quadratic fit to each residual time series before cross 
correlating between all pairs. The left-hand column shows results for 20 realizations of the PTA 
shown in the left-hand panel. The right-hand column shows cases with no GW contribution. 



interstellar medium (jColes et al.ll2010l ) or events intrinsic to the pulsar can plausibly induce non- 
Gaussian statistics in data from some objects. The cross-correlation between pulsars, being a 
second moment, is a sufficient statistic for the angular correlation function. We define the general 
temporal cross-correlation function (CCF) in the Appendix and focus attention here on its zero-lag 
value 



a 



00 



C{9 



°' T = 0) = it S ( ' ij{0ij = °' r = °) = jfr S / dtx ^ x iW> 

i<j i<j 



(3) 



which defines the single-pair CCF and the number of unique pairs in the double sum over i,j 
is Nx = N p (N p — l)/2 for N p pulsars. We have used continuous time notation; we justify this in 
the Appendix where we also discuss how to treat discretely sampled data. The ensemble average 



is 



a 



00 



dte 2 {t)) = V^gw(r). 



(4) 



We use ip to denote the ratio of the variances of the actual Earth term and the ensemble average 
variance. For red processes with power-law spectra oc f~ y , the range of ij) covers one to a few orders 
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Fig. 2. — (Left) Histograms of the zero-lag values of cross-correlation functions based on simula- 
tions of PTAs that use 20 pulsars. The mean of 2000 realizations is subtracted from each value and 
the result is normalized by the rms deviation, oq. Heavy lines show the histogram of Coo calculated 
using the pre-fit time series and the light lines show post-fit results. In the top panel GWs do not 
contribute to the timing residuals while in the bottom panel they contribute cr gw = 20 ns after a 
polynomial fit. Uncorrelated red and white noise contribute post-fit rms values a r = a n = 20 ns 
to each time series. Note that the the histograms in the bottom panel have been shifted by the 
average, post-fit signal to noise ratio, which for this case is S = (Coo)/a^ = 2.5, where oq is 
calculated from the simulations. The histograms in the upper panel have (Coo) = 0. (Right) Plot 
of metrics m\ vs. rri2 that characterize the shape of the cross correlation function of the post-fit 
timing residuals, as defined in the text; points are shown for 100 realizations of each case. The 
dashed lines denote the minimum threshold for the vertical axis and the maximum departure of 
the CCF peak from zero lag in order to provide a plausible detection. The open circles are for a 
case with no GW contribution while the open squares and filled circles have a post-fit GW rms of 
20 ns in each time series. 



of magnitude, with steeper power laws showing a wider range, as discussed in the Appendix. This 
implies that the GW background yields an actual rms TOA perturbation that can vary by more 
than a factor of ten. We define the signa l to noise ra t io S = Cqq/g^ in 



Our definition for 

S has some similarity to that defined by Uenet et al.l (|2005l ) but with a crucial difference. Their 
Eq. 4 is essentially a matched filter based on the Hellings and Downs angular correlation. In our 
notation, the numerator of their equation is (with subscript "J" for Jenet et al.) 



Cj 



-Ef 



c 



(5) 



The angular separation of the i-th and j-th objects is Oij and barred quantities are sample means 
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over all pulsar pairs of Cij and Qj = ((Oij), respectively. The ensemble mean is 

(Cj) = a g 2 w - f) , (6) 

where 

e = ^-Ea%). (t) 

X i<j 

For the compact pulsar configuration we consider, (Cj) vanishes because C, 2 = ( = 1 and thus 
cannot be used to quantify detection in this case. If we redefine the weighted correlation as 

Cj = ]y^y^ij(flij)Cij) ( 8 ) 

i<j 

the Jenet et al. test statistic becomes 

S'j = (9) 

which is identical to our definition for 5 when the pulsar configuration is compact with 6^ = 0. 

Later in the paper we will relate our results to an arbitrary configuration of pulsars by considering 
S'j, which simply multiplies our result for S by the mean-square angular correlation over the 
sample, C 2 . 



2.1. Detection Criteria 

Over an ensemble, the CCF vanishes unless there is a significant correlated term from the e(t) 
term (or from errors in the location of the solar system barycenter or from instrumentation that 
we do not include in our analysis). However, deviations from ensemble-average statistics in real 
data will produce both false positive and false negative detections from the uncorrelated terms in 
the timing residuals, like those shown in Figure [TJ 

A detection protocol for GWs can exploit the following aspects of timing residuals and their 
correlations: 



The timing residuals must i nclude a red-noise process caused by one or more of the predicted 
isotropic GW backgrounds (| Jenet et al.ll2005l ). If GWs from any discrete source are signif- 
icant, there should be a corresponding departure f rom white-noise statistics described by a 
spectrum that depends on the nature of the GWs ( Lommen $z Backer! boOll ). 



The maximum of the CCF is at or near zero time lag, r ~ 0, depending on how strong 
the GWs are relative to other contributions. Uncorrelated contributions produce estimation 
errors in Coo that peak at arbitrary time lags in estimates using a finite number of pulsars 
and thus can induce false non-detections and false positive detections. 



3. 



The zero-lag amplitude of the CCF must be significantly larger than expected when only 
uncorrelated terms contribute to the time series. 
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4. The correlation between pulsars is consistent with that expected (e.g. Eq. 5 of Hellings & 
Downs, 1983) for an isotropic background or the equivalent angular correlation for a discrete 
source. 

5. Any correlation established using one set of pulsars can be checked using a completely inde- 
pendent set of pulsars. 

When white and red-noise processes are significant, the estimated CCF has a high probability of 
peaking at a non-zero time lag. If white noise dominates the timing residuals of all pulsars, the 
CCF itself will vary rapidly with time lag and its formal maximum could be at any lag. Red noise 
by definition has a long correlation time so the CCF can appear quite smooth and yet peak far 
away from the origin. After a second order polynomial fit, the red-noise residuals will typically 
have two zero crossings so the CCF maximum is likely to be more centered than the pre-fit CCF, 
an effect we see in simulations. Nonetheless, the full CCF provides important statistical tests of 
the zero lag value. 

Figure [2] (left-hand panel) shows histograms of Coo obtained from simulations with and without a 
GW contribution and for both pre-and-post-fit cases. Under relevant circumstances, the distribu- 
tion is asymmetric, with a long tail for positive values while the mode and median are less than 
the mean. This counterintuitive result, discussed in the Appendix, occurs when the correlated 
quantity includes a red process with a steep power spectrum. The time series for a red process 
is effectively dominated by approximately one independent fluctuation so that the calculation of 
the CCF for one pair of pulsars does not satisfy the requirements for the CLT to apply, as it 
would if the time series were only white noise. The sum over all pulsar pairs also does not satisfy 
the requirements in part because the number of pairs exceeds the number of individual objects. 
Simulations show that the distribution for Coo becomes increasingly skewed for steeper spectra 
and for larger N p . We have not yet explored if there is an asymptotic form for the distribution. 
The skewness we have identified is similar in cause to that identifie d for studies of no n-Gaussianity 



of temperature fluctuations in the cosmic microwave background ([Smith et al.ll201ll ). 

The long tail influences the false-alarm rate sigificantly. One way to get a more symmetric distri- 
bution is to average multiple estimates of the cross correlation function. Multiple estimates can 
be obtained by subdividi ng timing residuals into M blocks, each of length T/M, as mentioned in 



([Shannon Sz Cordesll2010l ). The CLT will apply to the average so the distribution should tend to 



a Gaussian form. We discuss this further in § 12.31 



2.2. CCF Based Detection 

We define two metrics that characterize the shape of the CCF, 



mi 



c 



oo 



C n 



m 2 



(10) 



mi is the ratio of the CCF at zero lag to the most negative value; m-2 is the offset of the CCF 
maximum relative to the maximum calculated lag, which in our simulation is T max — T/2. The two 
metrics characterize the shapes of the CCFs without reference to the actual signal to noise ratio 
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of the GW signal and therefore provide a direct empirical mechanism for assessing the presence of 
a GW signal. We later define the signal to noise ratio S as an ensemble average quantity that can 
be related to physical models for the GWs. The metric mi is a similar measure but is based on a 
single realization of the correlation estimate and is normalized by the minimum of the particular 
CCF, not the rms value over an ensemble. Figure [2] (right panel) shows a scatter plot of mi 
and m,2 from simulations that displays a peak in mi near r = 0. The fraction of points in the 
peak depends on the strength of the GWs compared to other contributions and on the number of 
pulsars. For residuals dominated by uncorrelated red noise, there is a sizable fraction (~ 15 %) 
of cases with peak values that can mimic a GW detection. We define a joint detection criterion 
that comprises a lower bound mi > mi min and an upper bound \m2\ < m2 max . We also define the 
detection fraction fd as the fraction of PTA realizations in a simulation that satisfy the detection 
criteria. The corresponding false-alarm fraction is defined as the detection fraction in the absence 
of any GW signal. 




Fig. 3. — (Left) ROC curves showing detection fraction vs. false-alarm fraction for different 
detection criteria obtained by varying the thresholds for mi and m2- The detection fractions were 
calculated for the same GW strength (20 ns over 5 yr) but different numbers of pulsars and for 
different levels of red and white noise noise, as labeled. The false-alarm fraction was obtained by 
turning off the GW signal and keeping the red and white-noise levels the same as for the GW "on" 
case. (Right) Detection fraction plotted against the expected signal-to-noise ratio, S, of the cross 
correlation function. The five points for each curve correspond to PTAs with N p = 4, 8, 20, 50 and 
100 pulsars. Solid lines: The rms red and white noise are equal as labelled for each curve near 
the point corresponding to N p = 4. Dashed line: A case with white noise with 100 ns rms (no red 
noise) added to the GW signal. 



Inspection of Figure [2] suggests a threshold mi min = 1, which is plausible since the CCF of a noise- 
only signal is likely to have approximately equal positive and negative-going excursions. Also 
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consistent with the figure is a threshold "i2max = 0.1, which enforces the zero time-lag nature of 
the "Earth" part of the GW signal but rejects cases where noise processes steer the CCF maximum 
away from zero lag. 

The right-hand panel in Figure Q] shows cases where the CCFs satisfy the detection criteria (large 
amplitude and peak at r = 0) and others that do not. Conversely, when there is no GW contribu- 
tion, red noise can cause false positive cases that satisfy the criteria. We quantify these features 
in the discussion that follows. 

Specifying a detection criterion requires consideration of the tradeoff between the detection frac- 
tion, fa, and the false-alarm fraction, ft a . Figure [3] (left panel) shows "ROC" (receiver operating 
characteristics) curves calculated by varying mi m - m and ?ri2 max to alter the detection and false- 
alarm fractions. Each curve corresponds to a particular PTA (number of pulsars and levels of red 
and white noise). Ideally, one would like to have 100% detection fraction with no false alarms. 
The cases (N p , a r , <r n ) = (50, 10 ns, 20 ns) and (N p , a T , a n ) = (20, ns, 100 ns) come closest to this 
ideal. All of the other cases, which have larger noise levels or smaller numbers of pulsars in the 
PTA depart significantly from the ideal. With 20 pulsars having 20 ns of red noise and negligible 
white noise, a 90% detection fraction comes at the expense of a 11% false-alarm fraction. A larger 
number of pulsars (such as 50 pulsars with 20 ns red noise or 100 pulsars with 50 ns of red noise) 
decreases the false-alarm fraction to 8%. 

The mapping of signal-to- noise ratio S and detection fraction is shown in Figure [3] (right panel) . 
Most of the curves shown are for equal levels of red and white noise and different pulsar numbers 
(solid curves). Detection fractions fd > 0.8 require S > 1.5 and fd > 0.95 requires S > 2. A case 
with 100 ns white noise (dashed curve) shows that N p = 20 pulsars yields S ~ 3 and a detection 
fraction > 80% and minimal false-alarm f raction as shown i n the left-hand panel. Our results are 
therefore broadly consistent with those of lJenet et al.l (|2005l ). who consider only white noise TO A 
errors. The primary conclusion from Figure [3] is that red noise drastically alters the detection and 
false-alarm fractions and therefore also any assessment of GW detect ability. 



2.3. SNR of the Zero-lag Cross Correlation 

The detection criteria defined above are based solely on the shape of the cross-correlation function. 
It is useful to relate the detection fraction to the ensemble-average SNR of the correlation function 
(at zero lag), because the SNR can be related to the GW spectrum and properties of the PTA. 
The SNR of the CCF is defined as 

S= 9^ = ^k, (11) 
°c °c 

where the rms variation gq is given in the Appendix and includes the contributions from the GWs 
themselves along with uncorrelated red and white noise. For arbitrary combinations the SNR is 

-1/2 

(12) 



y/^NpM J (w gg + i 2 M W rr + 2£ M W gr ) 7] M M 
S = \ W 99 + ^Wrr + ^A^l) + ~W 



l {vm/Ns + 2 + 2£ M ) 



2^{N P - 1) 



Simulations yield SNRs that agree with this expression to within statistical errors. In Eq. (|12|) we 
have allowed for the division of the full time series into M blocks and incoherent averaging of the 
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CCF for each block, reducing the variance by 1/M. Both a u and cr gw depend implicitly on M, as 
discussed below. The SNR depends on rms red and white noise levels through the variance ratios 
Vm = &n/ a gw an d Cm = ( T r/ a gw that depend on the time span T/M owing to the nonstationarity 
of the red noise and GW signals. The number of blocks also enters in the leading coefficient in 
Eq. (I12p . because the correlation estimate from each block is averaged and we assume that the 
estimates are statistically independent. We have verified that red processes with spectral indices 
of four or less show this statistical independence. For steeper spectra, there is some correlation 
between blocks. 

For a dimensionless strain amplitude spectrum h c (f) = Af" 9 , the spectrum of timing residuals 
<x / 2 " 9_3 and the rms residual scales as a,r W (T) oc T Xg with x 9 = 1 — «g for a g < 1. For the GW 



background produced by merging SMBHs (jJaffe h Backer! 120 03). a g = —2/3 and x g = 5/3. We 



will use a fiducial value A = 10 15 yr 2//3 . Similarly, red timing n oise has been characterize d with 



exponents x r ~ 2 ± 0.2 corresponding to a spectrum oc J _5±0 - 4 ( Shannon fc Cordesll2010l ). The 
dimensionless ratios then become t]m = r\\M 2Xa and £m = £iM 2 ( X9 ~ Xr \ where rji and £i are the 
values for the full-length time series (M = 1). 

The quantities w gg ,w rr and w gr are dimensionless correlation times that are defined in the Ap- 
pendix. For steep power-law spectra, they are of order unity and independent of the data-span 
length owing to self-similarity. This same statement holds whether we consider the timing residual 
model in Eq. ([1]) to represent pre-fit residuals or those after removing a polynomial to account for 
corrections to the spin parameters. Inclusion of astrometric sinusoidal terms in the fitting function 
with one year and half-year periods will induce some dependence of the dimensionless scales on T 
but with diminishing importance as T ^> 1 yr. The equivalent quantity for white noise is 1/JVt 
because adjacent samples are uncorrelated. 

Smoothing (low-pass filtering) and decimation of the time series by N s samples before correlation 
reduces the rms of the white- noise. We consider cases where N s <C Nt/M so that with blocking 
there is still a large number of samples per block. We also consider the implied smoothing time 
N S T/N t to be small enough that it does not reduce the variance of the red processes over the block 
length T/M. 

The simplest case, though unrealizable, is where only GWs contribute to the timing residuals with 
the e term perfectly correlated and the p term completely uncorrelated. The SNR is 



S-l 
b ~ 2 



iPN p M/w gg 1 1/2 1 UN P M 



1 + l/2i,{N p - 1) 



w, 



(13) 



99 



the approximate equality holding when the number of pulsars is very large, ipN p 3> 1. The 
SNR grows as y/N p and can become arbitrarily large with \J~M subject to the requirement that 
the continuum approximation holds for arbitrarily small T/M. When white noise or red noise 
contribute, however, there is a distinct maximum in S vs. M. 

A number of other features are evident in Eq. (|12p . which has terms inside the curly brackets 
scaling as 0(1),1/N p ,l/N t and l/(N p N t ). The number of TOAs, N t , is important only as long 
as the white noise part of the residuals is sizable. If less than other terms, the number of TOAs 
- and thus any cadence in acquiring them — becomes unimportant. A larger number of pulsars 
can reduce the effects of both the red and white noise from non-GW contributions in addition to 
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reducing the uncorrelated "pulsar" part of the GWs. For very large Nt and N p , the SNR reduces 
to that for the GW-only case. 

We illustrate these and other trends in Figure HI In the left panel, the SNR is plotted against the 
number of blocks for cases that include red and white noise added to the GW perturbation. The 
plotted values are based on ip = 1 and on a total of N t = 10 3 TOAs over five years for 100 pulsars. 
We have dimensionalized the values for t]m using A = 10~ 15 . After a second o rder fit to a T = 5 yr 
data span, the rms residual is cXg^T) 20 ns (e.g. IShannon fe; Cordesll2010l ). 

The curve for no noise (red or white) increases monotonically with M, but there is a distinct 
maximum SNR for non-zero noise that separates the GW dominated and noise dominated regimes. 
The optimal number of blocks is M ~ 3 to 6 for the cases shown. In the noise-dominated regime, 
the SNR scales as ipNp^J NtN s M~ 2x<3 , so smoothing improves the SNR but blocking does not. 
When the SNR is not noise limited, it no longer depends on Nt, so smoothing will have no effect. 
This may be seen in Figure H] (left panel) for the curves labeled 20, 50 and 100 ns which converge 
at high SNR for both smoothing values shown, N s = 1 and 30. In the GW dominated case, there 
is no dependence on N t , N s , or on the scaling exponent, x g . 

Figure 0] (right panel) shows the SNR plotted against the number of pulsars used for several 
different values of white and timing noise. The curves in the figure were calculated for M = 3 
blocks and using a red noise scaling a r oc T Xr with x r = 2. If red spin or ISM noise is absent 
(£m = 0)> S oc y/NtNp for S -C 1 is linear in the number of pulsars (for large N p ) but then has 
a shallower dependence S oc y/N p when S is large enough to provide a confident detection. If 
we let Sgw be the SNR for the GW-only case in Eq. (fT3|) . it can be shown that S <C S'gw in the 
white- noise limited case where the rjj, dominates other terms in Eq. (|12j) and S oc N p /rjM- Thus 
for S large enough to correspond to a plausible detection, it is not likely to scale with N p as it 
does in the noise-limited regime but instead will scale as S oc y^Np. 



2.4. Comparison with Other Detection Approaches 



Our method uses the CCF as a test statistic and we calibrate it against the SNR S that can be 
related to t heoretical GW spe ctra and sources of noise. Our expression for S in Eq. (fT2j) is similar to 
Eq. (12) of lJenet et al.l (|2005l ). who define their detection significance as the SNR Sj of a weighted 
correlation quantity, as discussed in § [2j However, their expression does not explicitly account 
for red and white noise individually; instead a quantity \ is defined that measures the degree of 
whiteness of the timing residuals. Any non-white components are assumed to arise solely from the 
GW background and not from any additional spin-noise or ISM contribution. As a consequence, 
we do not expect the expressions for S and Sj to yield the same values for the same PTAs. In 
addition, it is assumed that Sj has Gaussian statistics when there is no GW contribution, with a 
false-alarm fraction calculated accordingly. As we have shown, if red-noise contributes to timing 
residuals, the distribution of S is highly non-Gaussian both with or without a contribution from 
GWs. 

It is still instructive to compare nominal SNR values. For a pulsar timing array comprising 
N p = 20 p ulsars obser v ed N± = 250 times over T = 5 years with an rms error a n = 100 ns and no 



red noise, iJenet et al.l (|2005l ) (their Figure 1) find SNR w 2.8 for a GW background of the same 
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Number of Blocks (M) Number of Pulsars (N ) 

Fig. 4. — (Left) SNR of the cross-correlation function versus number of blocks for cases where 
the timing residuals include red noise from spin variations and/or ISM perturbations along with 
GWs and white noise. The plotted curves use a r oc T Xr with x r = 2 and are labeled with the 
values for T = 5 yr. For the six curves with 20 ns of white noise, three incorporate smoothing 
of N s = 30 samples and the other three have no smoothing (N s = 1). (Right) SNR of the cross- 
correlation function versus the number of pulsars in the timing array sample. Both red noise and 

1/2 

white noise are included, as indicated. Dashed lines indicate SNR oc N p and SNR oc N p ' , which 
are the asymptotic scaling laws at low and high SNR, respectively. The plotted curves assume 
that a r oc T Xr with x r = 2 and were calculated for M = 3 blocks and N t = 10 3 data points. 
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form we have con sidered with A = 10 -15 yr~ 2 / 3 . For the same PTA we find S = 2.9 for M = 1. 
Jenet et al.l (|2005l ) obtain Sj = 4.5 using low-pass filtering (smoothing) and pre- whitening. Their 
low-pass filtering uses a high-frequency cutoff fh c = 4/T, corresponding to N s = N t /8 in our 
notation. The blocking method we have analyzed shares some features similar to pre-whitening. 
Including smoothing and optimal blocking (which turns out to be M = 1 for this case), we obtain 
S = 3.1. As we have shown (Figure [3]), this value is sufficient to yield a high detection fraction 
(~ 0.99) with a corresponding fairly lo w false-alarm frac tion (0.02). Similar detection and false- 
alarm fractions are not available for the I Jenet et al.l (|2005l ) approach. We note also that our values 
apply for the compact configuration of pulsars whereas those for Jenet et al. are for an unspecified 
configuration. 

In a Bayesian treatment of GW detection, van Haasteren et al. (2009) cast detection in terms of 
parameter estimation and define the detection significance for the coefficient A of the GW spectrum 
using the SNR of A, fx/cr, which we denote as S v h- For a PTA with N p = 20, T = 5 yr and 2Vt = 500, 
their Figure 12 indicates S v n « 2 and 4 for own = 100 ns an d 50 ns, res pectively. For the same 



cases, we find S = 3.2 and 3.8, respectively, with no smoothing or blocking. Ivan Haasteren &; Levin 
(|2010l ) state that their results are based on fixing all but one parameter (A) and thus yield larger 
than expected SNRs. We con clude that, nominally, our SNRs are not inconsistent with those of 
(jvan Haasteren Levin! l2010l ). We emphasize, however, that the most meaningful comparison is 
of detection and false- alarm fractions. 



3. Climbing Mount Significance: Optimizing Detection 

Methods for increasing the detection signficance can be identified by using Eq. (|12p in various 
limiting cases. In the white-noise limited regime, the composite quantity Z wn = N p \fN\W~ s / can 
be inspected. When red noise dominates the SNR, the quantity Z m = N p \^M ja^. is relevant. In 
the GW-dominated regime, we have Z gw = <JN p M. 

1. Increasing the number of pulsars N p helps in any regime, though it has greater impact in the 
noise limited case (red or white) . Detection of GWs almost certainly will occur in the regime 
where the SNR of the correlation-based detection statistic increases only as the square-root 
of the number of pulsars. 

2. In the white noise-limited regime, increasing Z wn through a combination of more pulsars, 
greater timing throughput, smoothing, and a decrease in timing error per TO A will increase 
S. A detection fraction larger than 0.9 combined with a false-alarm fraction < 0.1 requires 
S>2. 

3. Because red noise is spectrally similar to the GW contribution and does not average out 
significantly in the CCF because of its long correlation time, the primary means for increasing 
the SNR is to sum over many pulsars and to use blocking. 

4. In the GW-dominated regime, increasing the blocking M and the number of pulsars N p are 
the only options. As the figures show, M cannot be increased arbitrarily because eventually 
the rms noise in the interval T/M will overwhelm the GW signal, which decreases with 
smaller T. 
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The sensitivity of a PTA to GWs of course improves with total observing span. In the white noise 
dominated case, these improvements are the greatest, with S oc 7 l2x 's+ 1 / 2 oc T 23//6 . However, a 
detection cannot be made in this regime where S is small. In the regime where detections can be 
made, the longer observing span only enables a larger amount of sub-blocking so that S oc \[T . 

Of the factors we have discussed, the con t ributions to red and wh ite noise from interstellar re- 
fraction and diffraction (jColes et al.1 120101 : ICordes &: Shannon! [201CJ) can be partly mitigated by 
using higher frequencies and by appropriate fitting across wide bandwidths. Larger telescopes and 
bandwidths can minimize radiometer noise but will have no effect on jitter. Longer integration 
times are the only recourse for jitter but they also will minimize radiometer noise 

The most difficult hindrance to overcome is red noise from spin variations and from any residual 
ISM effects that cannot be corrected. The range of red timing noise levels in MSPs is not known 
definitively but our recent assessment ((Shannon Cordesll201Cll ) suggests that it is larger in objects 
with larger spin-frequency derivatives. Latent red noise may emerge in many MSPs when more 
sensitive and longer time-span observations are obtained. If so, greater timing throughput will be 
needed to time more MSPs as well as to increase the observing time per pulsar to reduce white 
noise sufficiently for detection. 



3.1. Implications 

The GW dominated regime provides the largest SNR and thus indicates the absolute minimum 
number of pulsars needed to make a detection. Defining a threshold S > 5 m i n , the number of 
pulsars required is 

N p > AS 2 min w gg /ipM. (14) 

For ifj = 1, M = 1, and w gg = 0.4 a minimum SNR of 2 requires N p = 6 pulsars, as seen in Figured 
With no timing noise or white noise, M can be made arbitrarily large (subject to sampling rates). 
However, even optimistic values of white noise and red noise (e.g. 20 and 20 ns, respectively) 
indicate that S is maximized for M ~ 2 to 3, so N p > 7/tf) is needed for M = 3. The actual 
variance of the "Earth" part of the GW background could have ip much larger or smaller than 
unity, so detection of GWs with a small number of pulsars may be marginal. A larger threshold 
SNR m i n = 5 will require N p > for M = 3. Our results suggest that an increase in the 

cadence of timing measurements to increase the total number of TOAs for each object (At) may 
be a necessary but insufficient course to take for GW detection. 

To obtain our results, we have assumed that the Earth term e(t) is the same for all pulsars in a 
hypothetical spatial configuration. For realistic distributions of pulsars on the sky, the correlation 
amplitude will be reduced by ( 2 « 0.6, thus increasing the number of required pulsars by about a 
factor of 1/(C^) 2 « 3. For PTAs with a range of red-and-white-noise levels, weights for each pulsar 
can be introduced in the double sum in Eq. ([3]). For a nominal level of white noise, for example, 
with some objects having smaller and others larger rms values, a weighted sum will yield a larger 
SNR than the case where all objects have the same rms white noise. If we take the nominal value 
as the minimum in the sample, however, the optimally weighted C will have lower SNR. 
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Currently only two MSPs (J17 13+0747, J1909— 3744 ) are known to have rms timing residuals less 
than 50 ns oyer a 5 -yr interval (jDemorest et al.1 12003 ) and one other less than 100 ns, J0437— 4715 
([Manchester! 12010 ) . An aggressive campaign is needed to find more MSPs with timing noise 
substa ntially less than 100 ns in a 5-year span. The timing noise scaling law of IShannon &: Cordes 
(J2010|) suggests that such MSPs will have small spindown rates and they may be less luminous if 
the radio beam luminosity correlates with the energy loss rate. This implies that MSP surveys 
may need to be more sensitive than at present. 

The best strategy is to identify ~ 20 "super" -stable MSPs with rms timing noise of 20 ns or less 
over time spans of 5 years or more if a detection threshold S m i n = 2 is considered sufficient for 
detection. A larger S m i n = 5 requires ~ 50 objects. However, if no such super-stable objects exist 
and MSPs more typically have 20 ns rms timing noise or larger over 5 years, many more MSPs 
will need to be timed, perhaps exceeding 100 MSPs. 

Even if the super-stable regime applies, once a detection is made, possibly using existing telescopes 
to time known stable objects along with any new discoveries in the near term, a more detailed 
analysis of the GW spectrum will be desired and that certainly will require a much larger set of 
MSPs and overall greater throughput of the timing. 

Each MSP needs a careful error budget analysis. This would include a detailed characterization 
of the red and white noise levels, including a breakdown of each from different physical causes. 
The two kinds of no ise can be distinguished through appropriate use of the structure function of 
timing residuals (e.g. Cordes fc Downs)ll985 ). Departures from white noise need to be characterized 
according to amplitude and spectrum and classified as contributions from red noise of any kind, 
from changes in inst rumentation, which can cause ju mps in pulse phase between epochs. A change- 
point analysis (e.g. 6 Ruanaidh fc Fitzgerald 1996 . Chapter 5) on timing residuals can identify 
the amplitudes of such jumps whether or not their occurrence epochs are known. MSPs with 
significant red noise that is demonstrated to be from non-GW causes should be rejected because 
they do not contribute to the sensitivity of a PTA to a stochastic background of GWs. 



4. Discussion 

The main results of our paper are as follows. 

The cross-correlation function is the primary statistic that we consider for a hypothetical pulsar 
distribution that yields the maximum possible signal to noise ratio, all else being equal. For a 
realistic dis tribution, the equ ivalent quantity would be a weighted sum similar to the quantity p 



defined by (jJenet et al.ll2005l ). but with a time-lag argument included. The CCF has amplitudes 
over an ensemble that have a positive skewed distribution that influences the detection and false- 
alarm fractions. We have taken this into account in our analysis and we also suggest that sub- 
dividing the entire span of timing residuals into M ~ 3 sub-spans for each pulsar will reduce the 
skewness and increase the statistical significance. Such blocking is equivalent to high-pass filtering 
the data. It requires a large-enough cadence for TOA measurements that there is an ample number 
of samples within each interval of length T/M. 

The number of pulsars needed for a likely detection of nanohertz GWs depe nds strongly on the 



levels of white noise and especially the red noise in the data. In related papers (jCordes fc Shannon 
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2010 . Shannon Sz Cordes, in preparation) we have shown that both kinds of noise are likely to 
be present due to torque noise in the pulsar, magnetospheric motions of emission regions, and 
interstellar plasma phenomena. 

Red and white noise timing residuals dramatically alter the achievable signal to noise ratio of the 
CCF. When residuals are white noise dominated, improvements can be made by increasing the net 
integration time per pulsar or by smoothing individual measurements over a time shorter than the 
smallest GW period that is likely to be identified. Marginal gains can also be made from blocking 
of the data. 

If, however, the detection statistic is dominated by red noise with a power spectrum similar to 
that of the GW power spectrum, smoothing or other increases in net integration time per TOA 
will not help. The best recourse is to increase the number of pulsars in the pulsar timing array. 

We have shown that the correlated GW signal contributes variance that can differ markedly from 
the ensemble value if the GW signal has a steep power-law spectrum, like that expected from 
merging of supermassive black holes. This stochasticity of the sample variance can either greatly 
enhance or diminish the chances of detecting the signal. The skewness of the correlation function's 
signal to noise ratio is an important factor in assessing detection and false-alarm statistics. The 
skewness is reduced when time series are divided into sub-blocks that are analyzed separately and 
then combined. 

As mentioned in the previous section, an important action is the characterization of the timing 
error budget for each MSP. 

We consider it likely that 50 to 100 spin-stable MSPs are needed to fully characterize GWs at 
nanohertz frequencies, including a secure detection followed by detailed characterization. A mini- 
mum of 20 MSPs is needed for a plausible detection under optimistic red and white noise levels, as 
described in this paper. A sample that is distributed on the sky will increase this number by ~ 60% 
and verification with a completely independent sample will require another doubling. The pro- 
gram going forward therefore requires an aggressive search campaign to discover more MSPs and 
to identify the most spin-stable objects. It is possible that the most stable ob jects are also those 



with smaller radio luminosities. The scaling law for red noise identified by ([Shannon &: Cordes 



2010) implies that objects with larger spin-down rates have larger noise levels. While not known 



for certain, the radio luminosity likely also is larger for these objects. In addition, furt her study of 



spin n oise in MSPs is needed to ascertain whether it can be mitigated, as suggested by lLyne et al 



(|2010h . 
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A. Appendix 



A.l. Definitions and Correlation Time Scales 



The G W perturbations induced in a pair of pulsars are c orrela ted according to their angular sepa- 
ration (|Hellings &; Downslll983l ). Following I Jenet et al.l (|2005l ) and others, we define an estimator 
for the angular and temporal correlation function as an integral over time and a sum over the 
Nx = N p (N p — l)/2 unique pairs of pulsars, 



C(9,t) 



1 



N X (8)£>T 



— / dtxi(t)xj(t + r), 



(Al) 



where pairs are summed such that the separation angle 0y between the i and j pulsars is 
within an angular bin centered on 9. In practice, the time integral is a sum over discrete time 
series, but it is more useful for our analysis to use continuous notation. It is easy to collapse our 
continuous-time result to the discrete-time case as discussed below. 

In the main text we discuss the zero-lag correlation Coo = C(0,0) and its signal-to- noise ratio, 
S = C(0,0)/crg. For a general u(t), the rms Coo is 
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where ipa 2 ^ is the mean square of e(t) over the time interval and ip ~ 1 takes into account that 
e(t) is a single realization of the GW process while we define a gw to be the ensemble- aver age 
variance (see further discussion below). The uncorrelated noise u has variance a 2 .; w eu and w uu 
are dimensionless correlation times of order unity that are discussed below. If there are no GWs, 
we substitute a\w uu — > afw rr + (<r„ + 2of CT n)/-^ r t> where N t is the number of discrete samples in 
[0,T]. Using additional dimensionless correlation times for the no-GW case, we obtain 
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The rms correlation scales as the inverse of the number of pulsars and is independent of the number 
of time samples when the red noise dominates. When GWs are significant, the dependence on N p 
is slower, for large N p scaling as erg w 2a^ w ^w ep /N p . We have verified these scaling laws using 
simulations with different combinations of white and red noise and number of pulsars. 

For arbitrary combinations of GWs, red, and white noise, we expand u = p + r + n and use 
corresponding variances and correlation functions to get 
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The dimensionless time scales used in the expressions for ag result from double integrals that 
comprise the variance of the correlation function, 

T 
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where the factors in the integrand are normalized correlation functions for individual terms defined 
in Eq. ([2]), p a (t, t'), with a = e,p, r, n. The correlation functions have two arguments because most 
of the processes we consider have nonstationary-like statistics over finite time intervals. We have 
oi(T) =T-lfZ'dt(a?(t)) and p a (t,f) ee (a(t)a(t')) /a 2 x (T). 

We account for the fact that e(t) represents a single realization of the GW signal. Under the 
assumption that all pulsars in the sample are in the same direction, e(t) is identical for all lines of 
sight. By contrast, the GW signal at each pulsar's location, p(t), is different for each LOS so that 
a set of N p ^> 1 pulsars samples a range of variances that are well represented by the ensemble 
average. Therefore we write 

a 2 e (T)p e (t,t') = i>a 2 p (T)p p (t,t') = ^a 2 gw (T)p g (t,t'), (A6) 

where ip is the ratio of the realization variance to the ensemble variance for the GWs. 

Even though e(t) represents a single realization while the correlation estimator uses N p S> 1 
independent realizations of pit), we approximate both as having the same correlation function and 
same dimensionless correlation time, w gg . We therefore let w ep « w pp m w gg and 

The white-noise correlation p n decorrelates on a time scale of one sample in discretely sampled 
data. Therefore the factors w en ,w pn ,w rn , and w nn , all become 1/N t , the reciprocal of the number 
of time samples. This approach is a good representation of timing residuals with irregular, discrete 
sampling where N t is large enough to sample adequately the red processes, including the GW signal. 

In contast to white noise, the steep red power spectra for e,p and r yield dimensionless time scales 
S> l/N t . Using simulations like those described below we find w ep ~ 0.36 for red noise created 
with an / _13//3 spectrum after removal of a second order polynomial. For red noise consistent with 
timing noise in pulsars (oc /~ 5 ), we obtain w er ~ 0.44. For a flat spectrum (white noise), we verify 
that w nn = 1/N t to within statistical errors in simulations. 

Eq. (|A4p incorporates smoothing of the original time series by N s samples and for blocking of 
the total data span of Nt samples into subintervals of N/M samples corresponding to a time 
interval T/M. Each subinterval is processed separately and the correlation estimates are summed, 
reducing the rms oq by a factor \/\[M. Clearly, any smoothing and blocking must yield a net 
number of samples per subinterval to be large enough to allow for a polynomial fit. We have also 
used dimensionless variance ratios, tjm = a n/ a gw(T/M) and £m = of (T '/ 'M) / 'c7g W (T '/ M) . We have 
explicitly indicated that the variances for the GWs and red noise are functions of the data span 
length, T/M. We discuss these in detail in the main text. 

Figure [T] shows histograms of the rms timing perturbations before and after fitting a quadratic 
function for processes with spectral indices y = 2, 4, 6 (which correspond roughly to random walks 
in spin phase, frequency and frequency derivative) and y = 13/3, the value expected from a GW 
background produced by merging supermassive black holes (SMBHs). Histograms are also shown 
for pure white noise (y = 0). The ratio of rms values equals \p§ for the pre- fit case and is 
proportional to \p§ for the post fit case. The ratio varies by more than an order of magnitude 
before fitting but covers a somewhat smaller range after fitting. 
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Fig. 1. — Histograms of the rms TOA for red processes with power-law spectra oc f~ y with 
y = 0,2,4, 6 and 13/3, as labelled in the bottom panel. The top and bottom panels show results 
before and after removing a second-order polynomial. The rms values are normalized by the 
ensemble rms of the pre-fit time series. 

A. 2. Generation of Simulated Time Series 

We generate realizations of red noise by shaping complex white noise in the frequency domain 
and performing an inverse discrete Fourier transform. We fill an array that includes Fourier 
components with periods that are four times longer than our desired time series so that low- 
frequency components are not underestimated. We then select 1/4 of the time series. To suitably 
mimic the analysis of pulsar timing data, we subtract a straight line whose end points equal the 
first and last data points. This accounts for the fact that prior to doing a least-square fit to 
timing data, a preliminary timing model is first removed. In this way we compare pre-and-post fit 
variances that are close to representing those that would result in actual applications. 

A. 3. Non-Gaussianity of the Zero-lag CCF 

In the main text we describe the skewness of the distribution of Coo toward positive values. The 
skewness is generic for time series that include a red-noise component. Two effects lead to this 
result. First, inspection of Eq §3§ shows that the correlation function for a single pair of pulsars 
is an integral (or sum) of the products of two time series. The CLT will apply if each time 
series includes many independent fluctuations over the interval [0, T]. The sum over all pairs will 
also satisfy the conditions for the CLT and Cqo will have a Gaussian distribution. For red noise 
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processes, however, each time series is dominated by of order only one fluctuation, so the CLT will 
not apply to the single-pair integral. The second effect is that when the CLT does not apply to 
the CCF for a single pair it also does not apply to the sum over all pairs, in part because a given 
time series contributes to N p — 1 terms in the sum and the terms are not independent. 
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